23 research outputs found
Learning End-to-End Goal-Oriented Dialog with Multiple Answers
In a dialog, there can be multiple valid next utterances at any point. The
present end-to-end neural methods for dialog do not take this into account.
They learn with the assumption that at any time there is only one correct next
utterance. In this work, we focus on this problem in the goal-oriented dialog
setting where there are different paths to reach a goal. We propose a new
method, that uses a combination of supervised learning and reinforcement
learning approaches to address this issue. We also propose a new and more
effective testbed, permuted-bAbI dialog tasks, by introducing multiple valid
next utterances to the original-bAbI dialog tasks, which allows evaluation of
goal-oriented dialog systems in a more realistic setting. We show that there is
a significant drop in performance of existing end-to-end neural methods from
81.5% per-dialog accuracy on original-bAbI dialog tasks to 30.3% on
permuted-bAbI dialog tasks. We also show that our proposed method improves the
performance and achieves 47.3% per-dialog accuracy on permuted-bAbI dialog
tasks.Comment: EMNLP 2018. permuted-bAbI dialog tasks are available at -
https://github.com/IBM/permuted-bAbI-dialog-task
Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning
Recently there has been a lot of interest in learning common representations
for multiple views of data. Typically, such common representations are learned
using a parallel corpus between the two views (say, 1M images and their English
captions). In this work, we address a real-world scenario where no direct
parallel data is available between two views of interest (say, and )
but parallel data is available between each of these views and a pivot view
(). We propose a model for learning a common representation for ,
and using only the parallel data available between and
. The proposed model is generic and even works when there are views
of interest and only one pivot view which acts as a bridge between them. There
are two specific downstream applications that we focus on (i) transfer learning
between languages ,,..., using a pivot language and (ii)
cross modal access between images and a language using a pivot language
. Our model achieves state-of-the-art performance in multilingual document
classification on the publicly available multilingual TED corpus and promising
results in multilingual multimodal retrieval on a new dataset created and
released as a part of this work.Comment: Published at NAACL-HLT 201
On End-to-End Learning of Neural Goal-Oriented Dialog Systems
Goal-oriented dialog systems assist users to complete tasks such as restaurant reservations and flight ticket booking. Deep neural networks have opened up the possibility of end-to-end learning of the entire goal-oriented dialog system directly from data. End-to-end learning enables automatic adaptation of the different parts of the dialog system accounting for how changes in one part affect the others. Since the entire dialog system is learned directly from the data, the design of the dialog system need not make any assumptions about the domain. This makes it possible to build dialog systems for new domains with different training data, without much domain-specific hand-crafting of the dialog system. With deep neural networks which can potentially capture the complexity of human dialog in natural language, learning neural goal-oriented dialog systems end-to-end holds the promise of bringing dialog systems into our everyday lives.
In this thesis, we identify some of the challenges in end-to-end learning of neural goal-oriented dialog systems and propose methods to address them. We look at four challenges:
1) The challenge posed by the presence of a large number of named entities in goal-oriented dialog tasks. We propose a method to build neural embeddings for named entities on the fly and store them in a key-value table with neural embeddings as keys and the actual named entities as values. The proposed method allows for comparison and retrieval, using neural embeddings as well as actual named entities, which leads to significant improvement in performance, especially in the presence of out-of-vocabulary named entities.
2) The challenge of performing supervised learning of goal-oriented dialog systems with multiple valid next utterances. We propose a method to learn to use different parts of the neural network to encode different predictions of the next utterances with learning of one not interfering with the learning of the others. Our experiments show considerable improvement in the generalization performance.
3) The challenge of handling new user behaviors during deployment of a trained dialog system. We propose a method that learns to anticipate failures and efficiently transfers dialogs to human agents in order to make sure the overall task success of the users remains high. Our experiments show that using our proposed method it is possible to achieve very high user task success while minimally using human agents.
4) The challenge of requiring large amounts of training data for each new dialog task of interest. We show that by selectively learning from a related task's data that is already available, we can improve the performance on a new task of interest that has only a limited amount of training data.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169752/1/rjana_1.pd
A Correlational Encoder Decoder Architecture for Pivot Based Sequence Generation
Interlingua based Machine Translation (MT) aims to encode multiple languages
into a common linguistic representation and then decode sentences in multiple
target languages from this representation. In this work we explore this idea in
the context of neural encoder decoder architectures, albeit on a smaller scale
and without MT as the end goal. Specifically, we consider the case of three
languages or modalities X, Z and Y wherein we are interested in generating
sequences in Y starting from information available in X. However, there is no
parallel training data available between X and Y but, training data is
available between X & Z and Z & Y (as is often the case in many real world
applications). Z thus acts as a pivot/bridge. An obvious solution, which is
perhaps less elegant but works very well in practice is to train a two stage
model which first converts from X to Z and then from Z to Y. Instead we explore
an interlingua inspired solution which jointly learns to do the following (i)
encode X and Z to a common representation and (ii) decode Y from this common
representation. We evaluate our model on two tasks: (i) bridge transliteration
and (ii) bridge captioning. We report promising results in both these
applications and believe that this is a right step towards truly interlingua
inspired encoder decoder architectures.Comment: 10 page
Language Model-In-The-Loop: Data Optimal Approach to Learn-To-Recommend Actions in Text Games
Large Language Models (LLMs) have demonstrated superior performance in
language understanding benchmarks. CALM, a popular approach, leverages
linguistic priors of LLMs -- GPT-2 -- for action candidate recommendations to
improve the performance in text games in Jericho without environment-provided
actions. However, CALM adapts GPT-2 with annotated human gameplays and keeps
the LLM fixed during the learning of the text based games. In this work, we
explore and evaluate updating LLM used for candidate recommendation during the
learning of the text based game as well to mitigate the reliance on the human
annotated gameplays, which are costly to acquire. We observe that by updating
the LLM during learning using carefully selected in-game transitions, we can
reduce the dependency on using human annotated game plays for fine-tuning the
LLMs. We conducted further analysis to study the transferability of the updated
LLMs and observed that transferring in-game trained models to other games did
not result in a consistent transfer
Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi
Cooperative Multi-agent Reinforcement Learning (MARL) algorithms with
Zero-Shot Coordination (ZSC) have gained significant attention in recent years.
ZSC refers to the ability of agents to coordinate zero-shot (without additional
interaction experience) with independently trained agents. While ZSC is crucial
for cooperative MARL agents, it might not be possible for complex tasks and
changing environments. Agents also need to adapt and improve their performance
with minimal interaction with other agents. In this work, we show empirically
that state-of-the-art ZSC algorithms have poor performance when paired with
agents trained with different learning methods, and they require millions of
interaction samples to adapt to these new partners. To investigate this issue,
we formally defined a framework based on a popular cooperative multi-agent game
called Hanabi to evaluate the adaptability of MARL methods. In particular, we
created a diverse set of pre-trained agents and defined a new metric called
adaptation regret that measures the agent's ability to efficiently adapt and
improve its coordination performance when paired with some held-out pool of
partners on top of its ZSC performance. After evaluating several SOTA
algorithms using our framework, our experiments reveal that naive Independent
Q-Learning (IQL) agents in most cases adapt as quickly as the SOTA ZSC
algorithm Off-Belief Learning (OBL). This finding raises an interesting
research question: How to design MARL algorithms with high ZSC performance and
capability of fast adaptation to unseen partners. As a first step, we studied
the role of different hyper-parameters and design choices on the adaptability
of current MARL algorithms. Our experiments show that two categories of
hyper-parameters controlling the training data diversity and optimization
process have a significant impact on the adaptability of Hanabi agents
How Should an Agent Practice?
We present a method for learning intrinsic reward functions to drive the
learning of an agent during periods of practice in which extrinsic task rewards
are not available. During practice, the environment may differ from the one
available for training and evaluation with extrinsic rewards. We refer to this
setup of alternating periods of practice and objective evaluation as
practice-match, drawing an analogy to regimes of skill acquisition common for
humans in sports and games. The agent must effectively use periods in the
practice environment so that performance improves during matches. In the
proposed method the intrinsic practice reward is learned through a
meta-gradient approach that adapts the practice reward parameters to reduce the
extrinsic match reward loss computed from matches. We illustrate the method on
a simple grid world, and evaluate it in two games in which the practice
environment differs from match: Pong with practice against a wall without an
opponent, and PacMan with practice in a maze without ghosts. The results show
gains from learning in practice in addition to match periods over learning in
matches only.Comment: AAAI-202